The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation
نویسندگان
چکیده
This paper describes the UPC participation in the WMT 12 evaluation campaign. All systems presented are based on standard phrasebased Moses systems. Variations adopted several improvement techniques such as morphology simplification and generation and domain adaptation. The morphology simplification overcomes the data sparsity problem when translating into morphologicallyrich languages such as Spanish by translating first to a morphology-simplified language and secondly leave the morphology generation to an independent classification task. The domain adaptation approach improves the SMT system by adding new translation units learned from MT-output and reference alignment. Results depict an improvement on TER, METEOR, NIST and BLEU scores compared to our baseline system, obtaining on the official test set more benefits from the domain adaptation approach than from the morphological generalization method.
منابع مشابه
The TALP-UPC Phrase-Based Translation Systems for WMT13: System Combination with Morphology Generation, Domain Adaptation and Corpus Filtering
This paper describes the TALP participation in the WMT13 evaluation campaign. Our participation is based on the combination of several statistical machine translation systems: based on standard phrasebased Moses systems. Variations include techniques such as morphology generation, training sentence filtering, and domain adaptation through unit derivation. The results show a coherent improvement...
متن کاملThe TALP-UPC Phrase-Based Translation System for EACL-WMT 2009
This study presents the TALP-UPC submission to the EACL Fourth Worskhop on Statistical Machine Translation 2009 evaluation campaign. It outlines the architecture and configuration of the 2009 phrase-based statistical machine translation (SMT) system, putting emphasis on the major novelty of this year: combination of SMT systems implementing different word reordering algorithms. Traditionally, w...
متن کاملThe TALP-UPC Spanish-English WMT Biomedical Task: Bilingual Embeddings and Char-based Neural Language Model Rescoring in a Phrase-based System
This paper describes the TALP–UPC system in the Spanish–English WMT 2016 biomedical shared task. Our system is a standard phrase-based system enhanced with vocabulary expansion using bilingual word embeddings and a characterbased neural language model with rescoring. The former focuses on resolving outof-vocabulary words, while the latter enhances the fluency of the system. The two modules prog...
متن کاملThe TALP&I2r SMT systems for IWSLT 2008
This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Politècnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we ...
متن کاملQCRI at WMT12: Experiments in Spanish-English and German-English Machine Translation of News Text
We describe the systems developed by the team of the Qatar Computing Research Institute for the WMT12 Shared Translation Task. We used a phrase-based statistical machine translation model with several non-standard settings, most notably tuning data selection and phrase table combination. The evaluation results show that we rank second in BLEU and TER for Spanish-English, and in the top tier for...
متن کامل